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DifFusion tensor imaging provides important information on tis- 
sue structure and orientation of fiber tracts in brain white matter in 
Q^ ' vivo. It results in diffusion tensors, which are 3x3 symmetric positive 

.^p , definite (SPD) matrices, along fiber bundles. This paper develops a 

functional data analysis framework to model diffusion tensors along 
j^ ' fiber tracts as functional data in a Riemannian manifold with a set 

of covariates of interest, such as age and gender. We propose a statis- 
tical model with varying coefficient functions to characterize the dy- 
namic association between functional SPD matrix-valued responses 
and covariates. We calculate weighted least squares estimators of the 
varying coefficient functions for the log-Euclidean metric in the space 
of SPD matrices. We also develop a global test statistic to test spe- 
cific hypotheses about these coefficient functions and construct their 
simultaneous confidence bands. Simulated data are further used to 
examine the finite sample performance of the estimated varying co- 
N ' efficient functions. We apply our model to study potential gender 

differences and find a statistically significant aspect of the develop- 
ment of diffusion tensors along the right internal capsule tract in a 
clinical study of neurodevelopment. 
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1. Introduction. Diffusion Tensor Imaging (DTI), which measures the 
effective diffusion of water molecules, can provide important information on 
the microstructure of fiber tracts and the major neural connections in white 
matter [Basser, Mattiello and LeBihan (1994a, 1994b)]. It has been widely 
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used to assess the integrity of anatomical connectivity in white matter. In 
DTI, a 3 X 3 symmetric positive definite (SPD) matrix, called a diffusion 
tensor (DT), and its three eigenvalue-eigenvector pairs {(A^, v^) -.k = 1,2,3} 
with Ai > A2 > A3 are estimated to quantify the degree of diffusivity and 
the directional dependence of water diffusion in each voxel (volume pixel). 
Multiple fiber tracts in white matter can be constructed by consecutively 
connecting the estimated principal directions (vi) of the estimated DTs 
in adjacent voxels [Basser et al. (2000)]. Subsequently, some tensor-derived 
scalar quantities, such as fractional anisotropy (FA) and mean diffusivity 
(MD), are commonly estimated along these white matter fiber tracts for 
each subject. Specifically, MD = (Ai -|- A2 -|- A3)/3 describes the amount of 
diffusion, whereas FA describes the relative degree of anisotropy and is given 
by 



(l_^) PA^ /3{(A^- A)2 + (A2 - A)2 + (As - A)2} 



V 2(Af + Ai + Ai) 

In the recent DTI literature, there is an extensive interest in developing 
fiber-tract based analysis for comparing DTIs in population studies [Gold- 
smith et al. (2011), Goodlett et al. (2009), O'Donnell, Westin and Golby 
(2009), Smith et al. (2006), Yushkevich et al. (2008), Zhu et al. (2010, 2011)]. 
The reason is that the region-of-interest (ROI) method primarily computes 
averages of diffusion properties in some manually drawn ROIs, generates 
various summary statistics per ROI, and then carries out statistical analysis 
on these summary statistics. This method suffers from identifying meaning- 
ful ROIs, particularly the long curved structures common in fiber tracts, 
the instability of statistical results obtained from ROI analysis, and the par- 
tial volume effect in relative large ROIs [Zhu et al. (2011)]. The fiber-tract 
based analysis usually consists of two major components, including DTI at- 
las building and a follow-up statistical analysis [Goodlett et al. (2009), Smith 
et al. (2006), Zhu et al. (2010)]. The DTI atlas building is primarily to ex- 
tract DTI fibers and to establish DTI fiber correspondence across all DTI 
data sets from different subjects. The key steps of the DTI atlas building 
include DTI registration, atlas fiber tractography and fiber parametrization. 
Finally, we get a set of individual tracts with the same corresponding geom- 
etry but varying DTs and diffusion properties. Some statistical approaches 
have been developed for the analysis of scalar tensor-derived quantities along 
fiber tracts [Goldsmith et al. (2011), Goodlett et al. (2009), Smith et al. 
(2006), Yushkevich et al. (2008), Zhu, Li and Kong (2010), Zhu et al. (2010, 
2011)], but little has been done on the analysis of whole DTs along fiber 
tracts, which is the focus of this paper. 

There is a growing interest in the DTI literature in developing statistical 
methods for the direct analysis of DTs in the space of SPD matrices [Dry- 
den, Koloydenko and Zhou (2009)]. Schwartzman, Mascarenhas and Taylor 
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(2008) proposed parametric models for analyzing SPD matrices and de- 
rived the distributions of several test statistics for comparing differences 
between the means of the two (or multiple) groups of SPD matrices. Kim 
and Richards (2011) developed a nonparametric estimator of the density 
function of a random sample of SPD matrices. Zhu et al. (2009) developed 
a semiparametric regression model with SPD matrices as responses and co- 
variates in a Euclidean space. Barmpoutis et al. (2007) and Davis et al. 
(2010) developed nonparametric methods, including tensor spline methods 
and local constant regression, to interpolate diffusion tensor fields. However, 
no one has ever developed statistical methods for functional analysis of DTs 
along fiber tracts. 

In this paper, we propose a varying coefficient model for DT-valued func- 
tions ( VCDF) . We use varying coefficient functions to characterize the vary- 
ing association between diffusion tensors along fiber tracts and a set of co- 
variates. Here, the varying coefficients are the parameters in the model which 
vary with location. Since the impacts of the covariates of interest may vary 
spatially, it would be more sensible to treat the covariates as functions of lo- 
cation instead of constants, which leads to varying coefficients. In addition, 
we explicitly model the within-subject correlation among multiple DTs mea- 
sured along a fiber tract for each subject. To account for the curved nature 
of the SPD space, we employ the log-Euclidean framework in Arsigny (2006) 
and then use a weighted least squares estimation method to estimate the 
varying coefficient functions. We also develop a global test statistic to test 
hypotheses on the varying coefficient functions and use a resampling method 
to approximate the p- value. Finally, we construct a simultaneous confidence 
band to quantify the uncertainty of each estimated coefficient function and 
propose a resampling method to approximate its critical points. To the best 
of our knowledge, this is the first paper for developing a statistical frame- 
work for modeling functional manifold-valued responses with covariates in 
Euclidean space. 

There are several advantages of the analysis of DTs over the analysis of 
scalar diffusion properties along fiber tracts. The first one is that it can avoid 
the statistical artifacts, including biased parameter estimates and incorrect 
test statistics and p- values for hypotheses of interest, created by compar- 
ing the biased diffusion properties along fiber bundles. This is because the 
real DT data estimated from the diffusion weighted images (DWIs) using 
weighted least squared methods are almost unbiased [Zhu et al. (2007b)], 
whereas the diffusion properties, which are nonlinear and linear functions of 
three eigenvalues of DT data, may be substantially different from the true 
diffusion properties [Anderson (2001), Pierpaoli and Basser (1996), Zhu et al. 
(2007b)]. In addition, as shown in Yuan et al. (2012), directly modeling DTs 
along fiber bundles as a smooth SPD process allows us to incorporate a 
smoothness constraint to further reduce noise in the estimated DTs along 
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the fiber bundles. This leads to the further reduction of noise in estimated 
scalar diffusion properties along the fiber bundles and less biased estimators 
of diffusion properties as shown in Figure 4 in Section 3. Moreover, the sole 
use of diffusion properties, which ignores the directional information of DT, 
can decrease the statistical power in detecting the difference in DTs oriented 
in different directions. 

The rest of the paper is organized as follows. Section 2 presents VCDF 
and related statistical inference. Section 3 examines the finite sample perfor- 
mance of VCDF via a simulation study. Section 4 illustrates an application 
of VCDF in a clinical study of neurodevelopment. Section 5 presents con- 
cluding remarks. 

2. Data and methods. 

2.1. Early brain development study of white matter tracts. We consider 
96 healthy infants (36 males and 60 females) from the neonatal project 
on early brain development led by Dr. Gilmore at the University of North 
Carolina at Chapel Hill. The mean gestational age of these infants is 245.6 
days with SD: 18.5 days (range: 192-270 days). A 3T Allegra head only MR 
system was used to acquire all the images. The system was equipped with 
a maximal gradient strength of 40 mT/m and a maximal slew rate of 400 
mT/(m • msec). The DTIs were obtained by using a single shot EPI DTI 
sequence (TR/TE = 5400/73 msec) with eddy current compensation. The 
six noncollinear directions at the 5-value of 1000 s/mm'^ with a reference 
scan {b = 0) were applied. The voxel resolution was isotropic 2 mm, and the 
in-plane field of view was set to 256 mm in both directions. To improve the 
signal-to-noise ratio of the DTIs, a total of five repetitions were acquired 
and averaged. 

We processed the DTI data set as follows. We used a weighted least 
squares estimation method [Basser, Mattiello and LeBihan (1994a), Yuan 
et al. (2008), Zhu et al. (2007b)] to construct the diffusion tensors. We used 
a DTI atlas building pipeline [Goodlett et al. (2009), Zhu et al. (2010)] to 
register DTIs from multiple subjects to create a study-specific unbiased DTI 
atlas, to track fiber tracts in the atlas space, and to propagate them back 
into each subject's native space by using registration information. Then, we 
calculated DTs and their scalar diffusion properties at each location along 
each individual fiber tract by using DTs in neighboring voxels close to the 
fiber tract. Since the description of the DTI atlas building has been described 
in detail [Goodlett et al. (2009), Zhu et al. (2010)], we do not include these 
image processing steps here for the sake of simplicity. Figure 1(a) displays 
the fiber bundle of the right internal capsule fiber tract (RICFT), which is 
an area of white matter in the brain. The internal capsule, which lies be- 
tween the lenticular and caudate nuclei, consists of a group of myelinated 
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Fig. 1. (a) The fiber bundle of the right internal capsule fiber tracts m the atlas space, (b) 
The ellipsoidal representations of full tensors along a representative right internal capsule 
fiber tract obtained from each of 10 selected subjects, colored with fractional amsotropy 
(FA) values. The rainbow color scheme is used with red corresponding to low FA value and 
purple corresponding to high FA value. 

fiber tracts including axons of pyramidal and extrapyramidal upper motor 
neurons that connect the cortex to the cell bodies of lower motor neurons. 
Although the internal capsule ends within the cerebrum, the axons that 
pass through it continue down through brain stem and spinal cord. It was 
found that neonatal microstructural development of the internal capsule 
tract correlates with severity of gait and motor deficits [Rose et al. (2007)] . 
Figure 1(b) presents DTs along a representative RICFT obtained from each 
of 10 subjects, in which each DT is geometrically represented by an ellip- 
soid. In this ellipsoidal representation, the lengths of the semiaxes of the 
ellipsoid equal the square root of the three eigenvalues of a DT, while the 
three eigenvectors define the direction of the three axes. 

Our final data set includes DTs and diffusion properties sampled along 
the RICFT and a set of covariates of interest from all n = 96 subjects. 
Specifically, let Sym"''(3) be the set of 3 x 3 SPD matrices and Xj G [0,Lo] 
be the arc length of the jth point on the RICFT relative to a fixed end point 
for j = 1, . . . ,nc, where Lq is the longest arc length and uq is the number 
of points on the RICFT. For the ith subject, there is a diffusion tensor at 
the jth point on the RICFT, denoted by Si{xj) G Sym"'"(3), for i = 1, . . . ,n. 
Let Zj be an r X 1 vector of covariates of interest. In this study, we have two 
specific aims. The first one is to compare DTs along the RICFT between the 
male and female groups. The second one is to delineate the development of 
fiber DTs across time, which is addressed by including the gestational age 
at MRI scanning as a covariate. Finally, our real data set can be represented 
as {(zj; (xi, S'i(xi)), . . . , (x^^, ^^(x^g))) : i = 1, . . . , n}. 

2.2. Varying coefficient model for SPD matrix-valued functional data. In 
this section we present our VCDF. The code for VCDF written in Matlab 
along with its documentation and a sample data set will be freely accessi- 
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ble from http://www.bios.unc.edu/research/bias/software.html. To 
make the code easily accessible, we developed a Graphical User Interface 
(GUI), also freely downloadable from the same website. 

To proceed, we need to introduce some notation. Let Sym(3) be the set 
of 3 X 3 symmetric matrices with real entries. For any A = (ofc/) G Sym(3), 
we define vecs(A) = (011,021,022,031,032,033)"^ to be a 6 x 1 vector and 

vec(A) = (011,012,013,021,022,023,031,032,033)'^ 

to be a 9 X 1 vector. Let Ivecs(-) be the inverse operator of vecs(-) such 
that Ivecs(vecs(^)) = A for any A G Sym(3). The matrix exponential oi Ag 
Sym(3) is given by exp(A) = E^=o^"V"i! ^ Sym+(3). For any 3 x 3 SPD 
matrix S, there is a logarithmic map of S, denoted as log(S') = A G Sym(3), 
such that exp(y4) = S. Let a®^ = aa for any vector or matrix a. 

Since the space of SPD matrices is a curved space, we use the log-Euclidean 
metric [Arsigny (2006)] to account for the curved nature of the SPD space. 
Specifically, we take the logarithmic map of the DTs Si{x) E Sym^(3) to 
get log(Si{x)) E Sym(3), which has the same effective dimensionality as a 
six-dimensional Euclidean space. Thus, we only model the lower triangular 
portion of log{Si{x)) as follows: 

(2.1) vecs(log(5'i(x))) = B{x)zi + Ui{x) + ei{x), 

where B{x) is a 6 x r matrix of varying coefficient functions for character- 
izing the dynamic associations between Si{x) and Zj, Uj(rc) is a 6 x 1 vector 
characterizing the within-subject correlation between the log-transformed 
DTs, and £i{x) is a 6 x 1 vector of measurement errors. It is also assumed 
that £i{x) and Ui{x) are independent and identical copies of SP(0,Se) and 
SP(0,Su), respectively, where SP(0,5]) denotes a stochastic process with 
mean and covariance matrix function S(x,rc') for any x,x' E [0,Lo]. Let 
l(-) be an indicator function. Assume that ei{x) and £i{x') for x ^ x' are 
independent and, thus, $]e(x,x') = Se(x,x)l(x = x'). It follows that the co- 
variance structure of vecs(log(5j(rEj))), denoted by 115(2;, x'), is given by 

(2.2) S5(x, x') = Eu(2;, x') + Se(x, x)l{x = x'). 

Model (2.1) is a multivariate varying coefficient model with a 6 x 1 vec- 
tor response and, thus, it can be regarded as a generalization of univariate 
varying coefficient models, which have been widely studied and developed 
for longitudinal, time series and functional data [Fan, Yao and Cai (2003), 
Fan and Zhang (1999, 2008) Wang, Li and Huang (2008), Wu and Ghiang 
(2000)]. 

2.3. Weighted least squares estimation. Before estimating the varying 
coefficient functions in B{x), we need to introduce a few facts about the 
log-Euclidean metric for the space of SPDs [Arsigny (2006)]. The use of 
the log-Euclidean metric results in classical Euclidean computations in the 
domain of matrix logarithms. Particularly, under the log-Euclidean metric. 
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the geodesic distance between Si and 5*2 in Sym"^(3) is uniquely given by 



(2.3) d{Si,S2) = Vtr[{log(Si) -log(52)}^'], 

which equals the Euclidean distance between log(5i) and log(52) in Eu- 
clidean space Sym(3). However, there is a subtle, but important, difference 
between regarding S{x) as a single point in Sym"'"(3) and treating log(S(a;)) 
as a vector in Euclidean space. By regarding S{x) as a point in Sym~''(3), 
we treat all elements of S{x) as a single unit and use a single bandwidth 
to smooth DTs. In contrast, by treating log(S(a;)) as a vector in Euclidean 
space, traditional smoothing methods smooth each element of log(S(a;)) in- 
dependently [Fan and Gijbels (1996), Wand and Jones (1995), Wu and Zhang 
(2006)]. 

We use the local linear regression method and the weighted least squares 
estimation to estimate B{x) [Fan and Gijbels (1996), Ramsay and Silver- 
man (2005), Wand and Jones (1995), Welsh and Yee (2006), Wu and Zhang 
(2006), Zhang and Chen (2007)]. Since the local linear regression method 
adapts automatically at the boundary points [Fan and Gijbels (1992)], it 
is ideal for dealing with DTs and scalar diffusion properties along fiber 
tracts with two ends (see Figure 1). Let h^^' be a given bandwidth, B{x) = 
dB{x)/dx be a 6 X r matrix, and Ir be the r x r identity matrix. Using 
Taylor's expansion, we can expand B{xj) at x to obtain 

(2.4) B{xj) w B{x) + B{x){xj - x) = B^w{x){Ir ® y^w [xj - x)}, 

where yh(xj — x) = (l,(xj — x)/h)'^ and i?^(i)(x) = [B{x),h^^'>B{x)] is a 
6 X 2r matrix. Based on (2.1) and (2.4), B{xj)zi can be approximated by 
Bf^(i){x){Ir ^yfiwixj — x)}zi. For a fixed bandwidth h^^', we can calcu- 
late a weighted least squares estimate of Bj^(i){x), denoted by B^{i){x) = 
[B{x]h^^')^h^^' B{x]h^^')], by minimizing an objective function given by 

n UQ 

^^Kf^w{xj-x) 

i=l 7=1 

(2.5) 

X d{\og{Si{xj)),\\ecs{B^(i){x){Ir®yh(i){xj - x)]zi)) , 

where Ky^{i) (•) = K [■ / h^^' ) / h^^' is rescaling of the kernel function K[-), such 
as the Gaussian or uniform kernel [Fan and Gijbels (1996), Wand and Jones 
(1995)]. The explicit form of B{x; h^^') can be found in Appendix C. 

We pool the data from all n subjects and develop a cross-validation 
method to select an estimated bandwidth /i'^\ denoted by he . Let B[x; 
/j(i))(-«) bg the weighted least squares estimator of B[x) for the bandwidth 
h^^> based on the observations with the ith subject excluded. We define a 
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cross-validation score, denoted by CVi(/i^^'), as follows: 



n no 



(2.6) CVi(/i«) = (nnG)-ij;j;d(log(5,(x,)),Ivecs(S(x;/i«)(-*)z,))'. 

We select he by minimizing C\i{h^^>). In practice, within a given range of 

h^^\ the value of he can be approximated by computing C\i{h^^>) through 
a series of h^^' . Finally, we can calculate a weighted least squares estimate 
of B{x), denoted by Be{x) = B{x; he )• 

2.4. Smoothing individual functions and estimating covariance matrices. 
To simultaneously construct the individual function Uj(a;), we also employ 
the local linear regression method. Let Ui{x) = dui{x)/dx. Taylor's expan- 
sion of Ui{xj) at X gives 

(2.7) Ui{xj) w Ui(x) + Ui{x){xj -x) = Ui{x)y,^(2){xj - x), 

where Ui{x) = [uj(x),/i^^'Uj(x)] is a 6 x 2 matrix. For each fixed x and each 
bandwidth /i'^\ the weighted least square estimator of Ui{x), denoted by 
Ui{x;h^'^') = [ui{x;h^'^^),h^'^'Ui{x]h'''^')], can be calculated by minimizing an 
objective function given by 

no 

^K,^(2){xj - x)d(log{Si{xj)),lvecs{Be{xj)zi + Ui{x)y,^(2){xj - x)))' 



\2 



Let Ri be an uq x 6 matrix with the jth row vecs(log(«S'j(xj))) — Be{xj)zi 



and S be an uq x uq smoothing matrix with the (z, j)th element K'? 



Xi,Xi), where A'w2)(t) is the empirical equivalent kernel [Fan and Gijbels 
(1996)]. It can be shown that 

(2.8) (uj(xi),...,Ui(x„g)) =SRi. 

We pool the data from all n subjects and select an estimated bandwidth of 
h^"^' , denoted as hi . We define a generalized cross-validation score, denoted 
by GCV(/i(2)), as follows: 

i2.9j ^UVift )-n {i_^-itr(5)}2 • 

We select he by minimizing GCY{h^'^'). Like the bandwidth selection in 

Section 2.3, the value of he can be approximated by computing GCY{h^'^') 

through a series of h'^'^K Finally, by substituting he into (2.8), we can cal- 
culate a weighted least squares estimate of Uj(x), denoted by Ui^e{x). 

After obtaining Ui^eix), we can estimate the mean function u(x) and the 
covariance function Su(2;, x'). Specifically, we estimate u(x) and ^u{x, x') by 
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using their empirical counterparts based on the estimated Ui^ei-c) as follows: 

n n 

Ue(x) =n"^^Uj,e(x) and llu{x,x') = {n- Q)~'^'^\li^e{x)\li^e{x') . 
i=l i=l 

We construct a nonparametric estimator of the covariance matrix Se(x,x) 
as follows. Let £i{xj) = vecs(log(«S'j(xj))) — Be{xj)zi — Ui^e{xj) be the esti- 
mated residuals for i = 1, . . . , n and j = 1, . . . , uq- We consider the kernel 
estimate of Tj^{x,x) given by 

(2.10) E.(.,.;,,<»)) = ,„-6)-'EE^|S#^^4|^- 

We pool the data from all n subjects and select an estimated bandwidth 
of h^^> , denoted as hi ■ Let T,s{xj,Xj) = {n — 6)~^Yl^=i^iixj)^i{xj)'^ be an 
estimate of Se(xj,Xj) and T,e{x,x;h^^'y~^' be the leave-one-out weighted 
least squares estimator of 'Ss{x,x). We define a cross-validation score, de- 
noted by CV2(/i^'^)), as follows: 

n no 

{nnGy'^^tT{[siix,f^-±,{xj,xj;h^^^t'Y'^eixj,x,)-'}. 

We select h^^' by minimizing CV2(/i''^^). In practice, within a given range of 
h^^\ the value of hi can be approximated by computing CV2(/i^^^) through 
a series of h^^'. Finally, by substituting hi ' into (2.10), we can calculate a 
weighted least squares estimate of Se(x,x), denoted by Tie,e{x,x). 

2.5. Asymptotic properties. We will use the following theorems to make 
statistical inference on varying coefficient functions. The detailed assump- 
tions of these theorems can be found in Appendix A and their proofs are 
similar to those in Zhu, Li and Kong (2010). Thus, we omit them for the sake 
of space. We need some notation. Let 13{x) = (PB{x)/dx'^ and G(0, S) be 
a Gaussian process with zero mean and covariance matrix function T,(x,x') 
for any x,x' £ [0,Lq]. 

Theorem 1. If the assumptions (C1)-(C6) in the Appendix A hold, then 
V^{vec{B{x;h^^^)-B{x)-0.5u2B{x)h^^^'^[l + Op{l)]):x£ [0,Lo]} ^ Xb{x), 

where => denote weak convergence of a sequence of stochastic processes, 
^Bi') follows a Gaussian process G(0, Su Cg) 0~^), and Q^ is the limit of 



-1 \^n 

1^1=1 



52 



n > , -1 zy as n — )■ oo. 



Theorem 1 establishes weak convergence of B{x; h^^') as a stochastic pro- 
cess indexed by x G [0, Lq] and forms the foundation for constructing a global 
test statistic and simultaneous confidence bands for {B{x) :x G [0,Lo]}. 
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Theorem 2. // the assumptions (C1)-(C7) in the Appendix A hold, 
then 

sup \T,u{x,x';h^^') — Su(3;,x')| = Op(l). 

(x,x')e[0,Lo]2 

Theorem 2 shows the uniform convergence of Tm{x,x';h^^'). This is use- 
ful for constructing global and local test statistics for testing the covariate 
effects. 

2.6. Hypothesis tests. In neuroimaging studies, many scientific questions 
of interest require the comparison of fiber bundle diffusion tensors along fiber 
bundles across two (or more) diagnostic groups and the assessment of the 
development of fiber bundle diffusion tensors along time. Such questions can 
often be formulated as linear hypotheses of B{x) as follows: 

(2.11) Ho:Cvec{B{x))=ho{x) for all x vs. Hi:Cvec{B{x)) ^ho{x), 

where C is a c x 6r matrix of full row rank and bo(x) is a given c x 1 vector 
of functions. 

We propose both local and global test statistics. The local test statistic 
can identify the exact location of a significant location on a specific tract. 
At a given point Xj on a specific tract, we test the local null hypothesis 

Hq{xj) ■.Cvec{B{xj)) =bo(xj) v.s. Hi{xj) ■.Cvec{B{xj)) 7^bo(xj). 

We use a local test statistic Tn{xj) defined by 

(2.12) Tn{xj) = nd{xjf{C{tu{xj,Xj)®n-^)C^}-^d{xj), 

where Clz = n~^ X^ILi^f ^'^'^ d(x) = Cvec(^e(x) — bias(^e(x))) — bo(x). 
Following Fan and Zhang (2000), a smaller bandwidth leads to a smaller 
value of h\as{Be{x))- Moreover, according to our simulation studies below, 
we have found that the effect of dropping h\a£,[B(,{x)) is negligible and, 
therefore, we drop it from now on. 

To test the null hypothesis Hq : C\ecs{B{x)) = bo(x) for all x, we propose 
a global test statistic T„ defined by 

(2.13) Tn= I °Tnix)dx. 

Jo 

Let Gc{-) be a Gaussian process with zero mean and covariance matrix 
function Sc(x,x'), which is the limit of 

{c(Su(x, x) ® n~^)c^}-^/^{c{tu{x, x') (S) n~^)c^} 

x{C(Su(x',x')®A-i)C^}~^/l 
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It follows from Theorem 1 that ^/n{C{T,u{x,x) Cl~^)C'^}~^'^ d{x) con- 
verges weakly to Gc{x). Therefore, it follows from the continuous mapping 
theorem that as both n and nc converge to infinity, we have 

(2.14) Tn^ [ ' Gc{xfGc{x) dx. 

Jo 

Based on the result (2.14), we develop a wild bootstrap method to approx- 
imate the p- value of T„. The detailed steps of the wild bootstrap method 
are given in Appendix B. 

2.7. Confidence band. Based on model (2.16), we construct a confidence 
band for S{B{x),z) = exp{lvecs{B {x)z)) S Sym"'"(3) over x G [0,Lo] for a 
fixed z. Specifically, at a given significance level a, we construct a simulta- 
neous confidence region in the space of SPD matrices for each z based on 
the critical value Cz{a) such that 

(2.15) P{d{S{B{x),z),S{B{x),z)) < C,{a) for all x G [0,Lo]) = 1 - a. 



Note that d{S {B (x) , z) , S {B (x) , z)) = ytr([Ivecs({Se(x) - B(x)}z)]®2). By 
using Theorem 1 , we have that as n — )■ oo , 



V^d{S{B{x),z),S{B{x),z)) =^ ^JtI[{lvecs{XB{x)z)}^% 

We develop an efficient resampling method [Kosorok (2003), Zhu et al. 
(2007a)] to approximately draw random samples from {Xb{x) :x S [0,Lo]}, 
denoted by {XB{xy^' :x ^ [0,Lo]} for g = 1, . . . ,G. The detailed steps of 
such a resampli ng method can be found in Appendix C. Subsequently, we 
can calculate \/ti[{lvecs{XB{x)^3)z)}'^'^] for all g and use them to approxi- 
mate Cz{a) for any given a. 

Moreover, for B{x) = {/3ki{x)), we can construct confidence bands for its 
individual varying coefficient function (3ki{x) for all (A;, /), A; = 1, . . . , 6 and / = 
1, . . . , r. Specifically, at a given significance level a, we construct a confidence 
band for each Pkiix) such that 

(2.16) P(/3^/"(rE) < pklix) < ^^'"(x) for ah x G [0, Lq]) = 1 - a, 

where /3^j'"(x) and f^j^l^ix) are the lower and upper limits of the confidence 
band. Let e^/ be a 6r x 1 vector with the (/ — l)r + A;th element equal to 
1 and all others equal to 0. It follows from Theorem 1 and the continuous 
mapping theorem that 

sup \^{(3ki,eix) - I3ki{x)}\ =;^ sup \eliXBix)\. 
xe[o,Lo] xe[o,Lo] 
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We define the critical point Cki{a) to satisfy -P(sup2,grQ j;^^] \e^iXB{x)\ < 
Cki{a)) = 1 — Q. Thus, a 1 — Q simultaneous confidence band for f3ki{x) 
is given by 

/o1v^ fa I \ Cklia) a / \ ^ Cki{a) 

(2.17) PfcZ.elx) -^,/3kLe[X) + 



n \/n 



Similar to Cz{ct), the critical point Cki{ct) can be approximated as the 1 — a 
empirical percentile of sup^gro^^Qi \eJ^iXB{xY^'\ for all g = 1,. . . ,G. 

3. Simulation study. We conducted a Monte Carlo simulation study to 
examine the finite sample performance of VCDF. At each point Xj along the 
RICFT, the noisy diffusion tensors are simulated according to the following 
model: 

(3.1) Si{xj) = exp{lvecs{B{xj)zi + tjUj(xj) + Ti{xj)ei{xj))), 

where r, and Ti{xj) were independently generated from a A^(0, 1) random 
generator for i = 1, . . . ,n and j = l,...,nG- Specifically, we set n = 96, 
no = 112 and Zj = (1, Gj, GageJ for i = 1, . . . , 96, where Gj and Gage^, re- 
spectively, denote gender and gestational age. To mimic real imaging data, 
we applied our proposed VCDF method to DTs along the RICFT from all 96 
infants in our clinical data to estimate B{x) by Bf,{x), Ui{x) by Ui^eix) via 
(2.8), and £i{x) by ii{x) = vecs(log(5'j(x)) — Be{x)zi — Uj.e(2;)). The curves 
of the varying coefficient functions of B(,{x) are presented in Figure 5. Ac- 
cording to our real data analysis in Section 4, the gestational age effect is 
significant for our clinical data. So we fixed all functions in B{x) at their cor- 
responding functions in Be{x) except that the third column of B{x), denoted 
by (/3i3(a;), . . . , ^^^{x))'^ , was set as c times the third column of Be{x) where 
c is set at different values in order to study the Types I and II error rates 
of our global test statistic in testing the gestational age effect. Figure 2(a) 
displays the simulated diffusion tensors along the RICFT at c = 1 . 

We have five aims in this simulation study. The first aim is to investi- 
gate the consequence of missing an important covariate. According to our 
real data analysis in Section 4, the Gage effect is significant, whereas the 
gender effect is not significant. We fitted two VCDF models, including three- 
covariate (intercept, gender and gestational age) and two-covariate (inter- 
cept and gender) models to smooth the DTs along the RICFT, and compare 
their performance in reconstructing the true DTs along the RICFT. Note 
that the two-covariate model does not include Gage^ as a covariate. Figure 2 
presents the estimated diffusion tensors using the three-covariate model [Fig- 
ure 2(c)] and the two-covariate model [Figure 2(d)]. Inspecting Figure 2(e) 
reveals that the three-covariate model leads a smaller mean geodesic dis- 
tance between the true and estimated DTs compared with the two-covariate 
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Fig. 2. Ellipsoidal representations of the true (a), simulated (b) and estimated (c) (based 
on three covariates) and (d) (based on two covariates) diffusion tensors along the RICFT, 
colored with their FA values. The rainbow color scheme is used with red corresponding to 
low FA value and purple corresponding to high FA value. Each set of 3 rows in (a)-(d) 
represents one tract of 112 DTs and the three rows are read from left to right in the top 
row, right to left in the middle row and then left to right in the bottom row. (e) Mean 
geodesic distances between the estimated and true diffusion tensors (green solid line based 
on three covariates and blue dash-dotted line based on two covariates) along the RICFT. 



model. Thus, the three-covariate model outperforms the two-covariate one 
in recovering the true DTs along the RICFT. 

The second aim is to investigate the finite sample performance of the 
global test statistic T„ based on the whole DT. In neuroimaging studies, 
some scientific questions require the assessment of the development of dif- 
fusion tensors along fiber tracts across time. We formulated the questions 
as testing the null hypothesis Hq : (3i3{x) = ■ ■ ■ = /363(x) = for all x along 
the RICFT. We first fixed c = to assess the Type I error rates for T„, and 
then we set c = 0.2,0.4,0.6, 0.8 and 1.0 to examine the Type II error rates 
for T„ at different effect sizes. 

We applied the estimation procedure of VCDF to the noisy DTs along 
the RICFT. We approximated the p- value of T„, by using the wild bootstrap 
method with G = 1000 described in Appendix B. For each c, we set the 
significance level a at both 0.05 and 0.01 and used 3000 replications to 
estimate the rejection rate of T„. At a fixed a, if the Type I rejection rate is 
smaller than a, then the test is conservative, whereas if the Type I rejection 
rate is greater than a, then the test is anticonservative, or liberal. Figure 3 
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Fig. 3. Simulation study: Types I and II error rates as functions of c. Rejection rates 
of T„ based on the wild bootstrap method are calculated at six different values of the effect 
size c for sample size 96 at the (a) 0.05 and (b) 0.01 significance levels using DTs, FA 
values, MD values, and joint values of FA and MD. 



presents the rejection rates of T„ across all effect sizes at the two significance 
levels (a = 0.05 or 0.01) by using full diffusion tensors. It is observed that 
Type I error rates are well maintained at the two significance levels. In 
addition, the statistical power for rejecting the null hypothesis increases 
with the effect size and the significance level, which is consistent with our 
expectation. 

The third aim is to demonstrate the power gain in using DTs compared 
with the sole use of diffusion properties. For each simulated diffusion tensor 
at c = 0.2,0.4, 0.6,0.8 and 1.0, we calculated its three eigenvalues Ai,A2 and 
A3 and two well-known scalar diffusion properties MD and FA. To compare 
the power of our method based on DTs with other methods based on scalar 
diffusion properties, we applied an existing method for the analysis of diffu- 
sion properties in Zhu et al. (2011) to three different scenarios: (i) FA, (ii) 
MD and (iii) (FA, MD). Then we tested the gestational age effect in each 
scenario. Inspecting Figure 3 reveals that the statistical power for rejecting 
the null hypothesis increases with the effect size and the significance level in 
all scenarios. Moreover, compared with the sole use of diffusion properties, 
the use of DT dramatically increases the statistical power for rejecting the 
null hypothesis. 

The fourth aim is to demonstrate the accuracy gain in estimating scalar 
diffusion properties along fiber tracts by directly modeling the DTs using 
VCDF. We compared two different methods for estimating FA's and MD's, 
here referred to as method A and method B, respectively. The method A 
first applies VCDF to estimate DT's and then calculates the FA or MD 
curve based on the estimated DT's. The method B first calculates the FA's 
or MD's from all SPD matrices and then uses varying coefficient methods 
in Euclidean space to estimate the FA's or MD's. We examined the finite 
sample performance of methods A and B by using the Mean Absolute Biases 
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Fig. 4. Plot of the MAB's of the estimated FA's (a) and MD's (b) using two methods A 
and B based on 3000 replications. The method A, which uses VCDF by directly modeling 
DT's, outperforms the method B m terms of smaller biases m estimating FA and MD 
values. 



(MAB) across all 112 locations, which is defined by 



96 



(3.2) 



MAByj = 96"^ ^ 



i=l 



3000 



3000-^ ^ Y,,j - Yi^ 



s=l 



where Ygij is the estimator of Yij , which can be the estimated FA or MD value 
at the jth location for the ith subject and the sth simulation. Figure 4 reveals 
that method A has the smaller biases in estimating FA and MD values and 
the biases are negligible compared with those obtained using method B. This 
indicates the potential large improvement gained by directly modeling DT 
data over method B. 

The fifth aim is to examine the coverage probabilities of the simultaneous 
confidence bands for all varying coefficient functions /3fcz(x) in B{x) and 
S{B{x),z). We only considered the generated diffusion tensor data at c= 1. 
We constructed the 95% and 99% simultaneous confidence bands for all 
/3fc/(x). Following Fan and Zhang (2000), we used a smaller bandwidth with 
a shrinkage factor 6 to improve the accuracy of the confidence bands. 

Table 1 summarizes the empirical coverage probabilities based on 3000 
replications for a = 0.01 and 0.05. The coverage probabilities are quite close 
to the prespecified confidence levels. Figure 5 presents typical critical values 
of 95% simultaneous confidence regions for vectors of coefficient functions 
h-{x) = (/3fci, • • • , /3fcr)"^) fc = 1, . . . , 6. Figure 6 summarizes the empirical cov- 
erage probabilities for S{B{x),z) based on 3000 replications at a = 0.01 and 
0.05. The coverage probabilities are quite close to the expected confidence 
levels. 

4. Analysis of the right internal capsule fiber tract. We have two spe- 
cific aims for the analysis of the right internal capsule fiber tracts. The 
first one is to compare DTs along the RICFT between the male and female 
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Table 1 

Simulated coverage probabilities for varying coefficient functions in B{x) = [Puii^)) based 

on 3000 replications at the significance levels a = 0.01 and 0.05 







a = 0.05 






a = 0.01 






Intercept 


Gender 


Gage 


Intercept 


Gender 


Gage 




1 = 1 


1 = 2 


1 = 3 


1 = 1 


1 = 2 


1 = 3 


I3ii[x) 


0.9497 


0.9420 


0.9387 


0.9867 


0.9837 


0.9810 


P2l{x) 


0.9440 


0.9443 


0.9383 


0.9843 


0.9907 


0.9857 


Pii{x) 


0.9457 


0.9383 


0.9400 


0.9870 


0.9833 


0.9807 


Pii{x) 


0.9480 


0.9457 


0.9400 


0.9880 


0.9870 


0.9850 


Pu{x) 


0.9437 


0.9350 


0.9350 


0.9870 


0.9873 


0.9823 


Pai (x) 


0.9473 


0.9400 


0.9403 


0.9860 


0.9827 
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Fig. 5. Typical 95% simultaneous confidence bands for varying coefficient functions 
j3ki{x). The solid, dotted and dash-dotted curves are, respectively, the true curves, the 
estimated varying coefficient functions and their 95% confidence bands. 
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Fig. 6. Simulated coverage probabilities for D{z,l3{x)) based on 3000 simulations for 
a = 0.05 (solid lines with diam,ond markers) and a — 0.01 (solid Imes with circle markers), 
(a) for female (b) for male at different gestational ages, respectively. 



groups. The second one is to delineate the development of fiber DTs across 
time. To achieve these two aims, we fitted VCDF to DTs along the RICFT 
with gestational age at MRI scanning and gender as covariates. We applied 
the estimation procedure in Section 2 to estimate B{x), Su(") •) and '^ei'i ")■ 
Then, we constructed the global test statistics T„ and the local test statistics 
Tn{xj) to test the gender effect and the gestational age effect based on DTs 
along the RICFT. The p value of T„ was approximated by using the resam- 
pling method with G = 5000 replications. Finally, we constructed the 95% 
simultaneous confidence bands for the varying coefficient functions (3ki{x)- 

To test the gender and gestational age effects, we calculated the local test 
statistics Tn{xj) and their corresponding p values across all points on the 
RICFT. It is shown in Figure 7(a) that most points do not have — log]^o(p) 
values greater than 1.3 for testing the gender effect. Then, we also computed 
the global test statistic T„ = 797.65 and its associated p-value p = 0.3934, 
indicating no gender effect. Inspecting Figure 7(b) reveals that the — log]^o(p) 
values of Tn{xj) for testing the gestational age effect are extremely signifi- 
cant in the middle part of the RICFT. The global gestational age effect was 
also found to be highly significant with T„ = 5271.7 and its p- value p < 10^^. 
It indicates that DTs along the RICFT are significantly associated with the 
gestational age, even though there is no gender difference among DTs along 
the RICFT. In order to investigate the development of DTs across the ges- 
tational age, we chose a location at arclength = 61.02 and observed that the 
diffusion tensors become anisotropic and their sizes become smaller as gesta- 
tional age increases [Figures 7(b) and (c)]. Recall that the three eigenvalues 
of a DT refiect the magnitude of the diffusion of water molecules along three 
directions parallel to its three eigenvectors and that MD reflects the total 
magnitude of the diffusion of water molecules. To show the decreasing trend 
of DT, we also plotted the curves of all three eigenvalues and MD values 
in Figures 7(e) and (g), respectively, both of which explicitly show that the 
first eigenvalue does not change much, whereas the second, third eigenvalues 
and MD values decrease with the gestational age. In addition, it is observed 
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Fig. 7. (a) The — logiQ(p) values of test statistics Tn{xj) for testing gender or gestational 
age effect of diffusion tensors on the right internal capsule tract, which shows no significant 
gender effect and significant gestational age effect. The ellipsoidal representations of (b) 
raw and (c) smoothed diffusion tensors changing with the gestational age at one location 
(at arclength = 61.02J on the right internal capsule tract with significant gestational age 
effect, colored with FA values. The rainbow color scheme is used with red corresponding to 
low FA value and purple corresponding to high FA value. The plots of three eigenvalues 
(d), FA (e) and MD (f) values at that location. 



from 7(f) that FA increases with gestational age, which indicates that DTs 
become more anisotropic as gestational age increases. 

Figure 8 presents the estimated varying coefficient functions along with 
their 95% simultaneous confidence bands. In Figure 8 all simultaneous con- 
fidence bands contain the horizontal line crossing (0, 0) for the gender effect, 
whereas the horizontal line is out of the 95% simultaneous confidence band 
for /343(x), which indicates the significant gestational age effect. This agrees 
with our previous analysis results based on the global and local test statistics 
for the gender and gestational age effects. 

Finally, Figure 9 presents the 95% critical values for S{B{x),z) and the 
estimated S{B{x),z) along the RICFT across gestational age for female and 
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Fig. 8. 95% simultaneous confidence bands for coefficient functions. The solid curves are 
the estimated coefficient functions and the dashed curves are the 95% confidence bands. 
The thin horizontal line is the line crossing the origin. 



male groups, respectively. Inspecting Figure 9 reveals that the variation of 
S{B(x),z) is larger on the two boundary points (especially on the right side) 
and smaller in the middle. In addition, the apparent trend of DT's changing 
with gestational age is shown at arc-length = 61.02 for both female and male 
groups. 

5. Discussion. In this paper we have developed a functional data anal- 
ysis framework, VCDF, for modeling diffusion tensors along fibber bundles 
in the Riemannian manifold of SPD matrices under the log-Euclidean met- 
ric with a set of covariates of interest. The most important characteristic 
of our method is that it is formulated based on the whole diffusion tensors 
instead of the DT derived scalar quantities and, thus, it can directly handle 
diffusion tensors. In addition, VCDF can characterize the dynamic associa- 
tion between functional DT-valued responses and covariates by using a set 
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Fig. 9. The 95% critical values for S{B{x),z) across gestational ages for female (a) and 
male (b) groups, respectively. The ellipsoidal representation of the estimated S{B{x),z) 
along the right internal capsule tract across gestational ages for female (c) and male (d) 
groups, respectively, colored with FA values. The rainbow color scheme is used with red 
corresponding to low FA value and purple corresponding to high FA value. The displayed 
four rows from the top to the bottom correspond to DTs at arclength 0,31.22,61.02,80.92 
and 116.47. Specifically, the third row shows the apparent trend of DT's changing with 
gestational age for both female and male groups. 



of varying coefficient functions. Compared with the methods based on DT 
derived quantities, such as FA and MD, our method shows the apparent 
superiority in estimating DT derived quantities compared with those based 
on DT derived quantities (Figure 4). One reason is that the DT data which 
is estimated from DWIs is almost biased, whereas the DT derived quantities 
are hnear and nonhnear functions of eigenvalues of DT data, which are very 
different from the ground truth. The other reason is that directly modeling 
DTs along fiber bundles as a smooth SPD process allows us to incorporate 
a smoothness constraint to further reduce noise in the estimated DTs along 
the fiber bundles. This leads to the further reduction of noise in estimated 
scalar diffusion properties along the fiber bundles. In addition, our method 
has the greater statistical power in detecting the effect of covariates of in- 
terest as is shown in Figure 3. One reason is that VCDF is less biased in 
parameter estimation. The other one is that our method accounts for all 
information contained in the DTs along the fiber bundles. 

Several major issues remain to be addressed in future research. All fiber- 
tract-based methods including VCDF are only applicable to these prominent 
white matter tracts and do not account for the uncertainties of tracking 
these fiber tracts. It is important to develop new statistical methods to 
appropriately account for such uncertainties in fiber-tract analysis especially 
for inconspicuous fiber tracts. VCDF is based on the second-order diffusion 
tensor. It may be interesting to extend VCDF to the analysis of high angular 
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resolution diffusion imaging (HARDI), which is important for resolving the 
issue of fiber crossing [Assemlal et al. (2011)]. Furthermore, it would be of 
great interest to extend VCDF to longitudinal studies and family studies. 
Finally, we have treated DTs along fiber tracts as functional responses; it 
would be interesting to treat DTs along fiber tracts as varying covariate 
functions to predict a scalar outcome (e.g., diagnostic group) [Goldsmith 
et al. (2011)]. 

APPENDIX A: ASSUMPTIONS 

Assumption CI. £i{x) and Uj(x) are identical and independent copies 
of SP(0, Sg) and SP(0, Su); respectively. £i{x) and £i{x') are independent 
for any x ^ x' ^ [0,-Lo]- £i{x) and Uj(a;') are independent for any x,x' G 
[0,Lo]. Moreover, with probability one, the sample path of Uj(x) has con- 
tinuous second-order derivative on [0,Lo] and E[s\^>,J.^\Q^l^^ ||uj(x)||2^] < oo 
and E'{sup^g[o,Lo][l|ui(^)ll2 + l|uj(a;)l|2]'"^} < oo for all ri,r2 G (2,oo), where 
II • II 2 is the Euclidean norm. 

Assumption C2. All components of B[x) and ^^(x,^) have continuous 
second-order derivatives on [0,Lo]. The fourth moments of £i{x) are contin- 
uous on [0, Lq]- All components of Tj^i{x,x') have continuous second-order 
partial derivatives with respect to {x,x') € [0,Lo]^. Moreover, Ti^{x,x) and 
Eu(x,x) are positive for all x G [0,Lo]. 

Assumption C3. The points X = {xj,j = 1, . . . , tlg} are independently 
and identically distributed with density function 'tt{x), which has the bounded 
support [0,Lo]. For some constants ttl and vr^/ G (0,oo) and any x G [0,Lo], 
T^L < vr(x) < TTu and 7r(2;) has continuous second-order derivative. 

Assumption C4. The kernel function K{t) is a symmetric density func- 
tion with a compact support [—1,1] and is Lipschitz continuous. 

Assumption C5. The covariate vectors Zj are independently and iden- 
tically distributed with Ezi = ^^ and E'IUzjUI] < oo and that E[7!f ] = ^z 
is invertible. 

Assumption C6. Both n and nc converge to oo, /i^^^ = o(l), nch'^^') — > 
oo, and /i^^^-^l log/i^^^l^-^M < nJ."^/''\ where qi G (2,4). 

Assumption C7. £'[||ej(xj)||2^] < oo for some q2 G (4, oo), max(/i(2), 
/i(3)) = o(l), nG(/i(2) + /i(3)) ^ oo, (/i(2))-4(iogn/n)i-2/92 = o(l), and 
ih(^))-\logn/ 
n)i-2/q2 = o(l). 
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APPENDIX B: WILD BOOTSTRAP METHOD 

We develop the four key steps of the wild bootstrap method for approxi- 
mating the p-value of T„, as follows. 

Step (i): Use the weighted least squares estimation to fit model (2.1) un- 
der the linear constraint specified in Hq, which yields B*{xj). Calculate 
u*g(xj) according to (2.8) and e*^{xj) = vecs(log(S'i(xj))) — Be{xj)*Zi — 
u*g(xj) for i = 1, . . . ,n and j = 1, . . . ,nc- 

Step (ii): Generate a random sample r^ and Ti{xj)^^' from a A^(0, 1) 
random generator for i = l,. . . ,n and j = 1,. . . , uq and then construct 

Si{x,)^3) = exp(Ivecs(4*(x,>, + rl'^vil^ix,) + t,{x ,)^3^ el,{x ,))) . 

Then, based on Si{xjp3' , we recalculate Be{xp^', and d{x)^^' = CBe{x)^^' — 
bo(3;). We compute 

T(f)= [ V„(x)(^)dx, 
Jo 

r„(xj)(s)=nd(xj)(9)^{C(Su(xj,Xj)«)A-i)C^}~M(xj)(^) 

for i = l,...,nG. 

Step (iii) : Aggregate the results of Step (ii) over g = 1, . . . ,G to obtain 

{?nfmax = maxi<j<„g Tn{xj)^3) : gr = 1, . . . , G} and calculate p{xj) = G"^ x 

^ -^ l(T„fmax > Tn{xj)) for each Xj. The p{xj) is the corrected p- value at 
the location Xj. 

Step (iv): Aggregate the results of Step (ii) over g = \,. . . ^G to obtain 
{T^^^ :5 = 1, . . . , G} and calculate p = G'^ X^Jl^ l(Ti^^ > T„). lip is smaller 
than a prespecified significance level a, say, 0.05, then we reject the null 
hypothesis Hq. 

APPENDIX C: RESAMPLING METHOD FOR APPROXIMATING 

GAUSSIAN PROCESS 

Recall that B^{i){x) = [B{x),h'^^'>B{x)\ in (2.4) is a 6 x 2r matrix. It can 
be shown that Bj^(i){x)'^ is given by 

n no 

(C.l) S(/i(^\x)"-^^^J<'^(i)(xj -x)[zi(g)yf^(i){xj - x)]vecs{log{Si{xj))f , 

i=l j=l 

where ^{h^^lx) = ZtiE]^iKhw{xj - x)[zf ^y^^a^ixj -x)'^^]. Thus, we 
can obtain B{x\h^^') as follows: 

(C.2) S(x;/i(i)) = [/,®(l,0)]S^(i)(x). 
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To approximately simulate from the Gaussian process Xb{-), we develop 
a resampling method as follows: 

• Based on B{xj;h^^'), we calculate ri{xj) = Yecs{log{Si{xj))) — B{xj; h^^>)zi 
for i = l, . . . ,n and j = 1, . . . , nc. 

• For g = 1,. . . ,G,we independently simulate {t^ : i = 1, . . . , n} from A^(0, 1) . 

• For g = 1, . . . ,G, we calculate a stochastic process Xsix^^^ given by 

n uq 

where Ci{xj — x; h^^>) = [zj (8) y^(i) [xj — x)] is a 2r x 1 vector. 
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